49 research outputs found
An Evolutionary Algorithm for Discovering Multi-Relational Association Rules in the Semantic Web
International audienceIn the Semantic Web context, OWL ontologies represent the conceptualization of domains of interest while the corresponding assertional knowledge is given by RDF data referring to them. Because of its open, distributed, and collaborative nature, such knowledge can be incomplete, noisy, and sometimes inconsistent. By exploiting the evidence coming from the assertional data, we aim at discovering hidden knowledge patterns in the form of multi-relational association rules while taking advantage of the intensional knowledge available in ontological knowledge bases. An evolutionary search method applied to populated ontological knowledge bases is proposed for finding rules with a high inductive power. The proposed method, EDMAR, uses problem-aware genetic operators, echoing the refinement operators of ILP, and takes the intensional knowledge into account, which allows it to restrict and guide the search. Discovered rules are coded in SWRL, and as such they can be straightforwardly integrated within the ontology, thus enriching its expressive power and augmenting the assertional knowledge that can be derived. Additionally , discovered rules may also suggest new axioms to be added to the ontology. We performed experiments on publicly available ontologies, validating the performances of our approach and comparing them with the main state-of-the-art systems
Meta-interpretive learning of higher-order dyadic datalog: predicate invention revisited
Since the late 1990s predicate invention has been under-explored within inductive logic programming due to difficulties in formulating efficient search mechanisms. However, a recent paper demonstrated that both predicate invention and the learning of recursion can be efficiently implemented for regular and context-free grammars, by way of metalogical substitutions with respect to a modified Prolog meta-interpreter which acts as the learning engine. New predicate symbols are introduced as constants representing existentially quantified higher-order variables. The approach demonstrates that predicate invention can be treated as a form of higher-order logical reasoning. In this paper we generalise the approach of meta-interpretive learning (MIL) to that of learning higher-order dyadic datalog programs. We show that with an infinite signature the higher-order dyadic datalog class H2 2 has universal Turing expressivity though H2 2 is decidable given a finite signature. Additionally we show that Knuth–Bendix ordering of the hypothesis space together with logarithmic clause bounding allows our MIL implementation MetagolD to PAC-learn minimal cardinality H2 2 definitions. This result is consistent with our experiments which indicate that MetagolD efficiently learns compact H2 2 definitions involving predicate invention for learning robotic strategies, the East–West train challenge and NELL. Additionally higher-order concepts were learned in the NELL language learning domain. The Metagol code and datasets described in this paper have been made publicly available on a website to allow reproduction of results in this paper
Human-machine scientific discovery
International audienceHumanity is facing existential, societal challenges related to food security, ecosystem conservation, antimicrobial resistance, etc, and Artificial Intelligence (AI) is already playing an important role in tackling these new challenges. Most current AI approaches are limited when it comes to ‘knowledge transfer’ with humans, i.e. it is difficult to incorporate existing human knowledge and also the output knowledge is not human comprehensible. In this chapter we demonstrate how a combination of comprehensible machine learning, text-mining and domain knowledge could enhance human-machine collaboration for the purpose of automated scientific discovery where humans and computers jointly develop and evaluate scientific theories. As a case study, we describe a combination of logic-based machine learning (which included human-encoded ecological background knowledge) and text-mining from scientific publications (to verify machine-learned hypotheses) for the purpose of automated discovery of ecological interaction networks (food-webs) to detect change in agricultural ecosystems using the Farm Scale Evaluations (FSEs) of genetically modified herbicide-tolerant (GMHT) crops dataset. The results included novel food-web hypotheses, some confirmed by subsequent experimental studies (e.g. DNA analysis) and published in scientific journals. These machine-leaned food-webs were also used as the basis of a recent study revealing resilience of agro-ecosystems to changes in farming management using GMHT crops
Next-Generation Global Biomonitoring: Large-scale, Automated Reconstruction of Ecological Networks
We foresee a new global-scale, ecological approach to biomonitoring emerging within the next decade that can detect ecosystem change accurately, cheaply, and generically. Next-generation sequencing of DNA sampled from the Earth's environments would provide data for the relative abundance of operational taxonomic units or ecological functions. Machine-learning methods would then be used to reconstruct the ecological networks of interactions implicit in the raw NGS data. Ultimately, we envision the development of autonomous samplers that would sample nucleic acids and upload NGS sequence data to the cloud for network reconstruction. Large numbers of these samplers, in a global array, would allow sensitive automated biomonitoring of the Earth's major ecosystems at high spatial and temporal resolution, revolutionising our understanding of ecosystem change
On deducing causality in metabolic networks
© 2008 Bodei et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution Licens
Automated Discovery of Food Webs from Ecological Data Using Logic-Based Machine Learning
Networks of trophic links (food webs) are used to describe and understand mechanistic routes for translocation of energy (biomass) between species. However, a relatively low proportion of ecosystems have been studied using food web approaches due to difficulties in making observations on large numbers of species. In this paper we demonstrate that Machine Learning of food webs, using a logic-based approach called A/ILP, can generate plausible and testable food webs from field sample data. Our example data come from a national-scale Vortis suction sampling of invertebrates from arable fields in Great Britain. We found that 45 invertebrate species or taxa, representing approximately 25% of the sample and about 74% of the invertebrate individuals included in the learning, were hypothesized to be linked. As might be expected, detritivore Collembola were consistently the most important prey. Generalist and omnivorous carabid beetles were hypothesized to be the dominant predators of the system. We were, however, surprised by the importance of carabid larvae suggested by the machine learning as predators of a wide variety of prey. High probability links were hypothesized for widespread, potentially destabilizing, intra-guild predation; predictions that could be experimentally tested. Many of the high probability links in the model have already been observed or suggested for this system, supporting our contention that A/ILP learning can produce plausible food webs from sample data, independent of our preconceptions about “who eats whom.” Well-characterised links in the literature correspond with links ascribed with high probability through A/ILP. We believe that this very general Machine Learning approach has great power and could be used to extend and test our current theories of agricultural ecosystem dynamics and function. In particular, we believe it could be used to support the development of a wider theory of ecosystem responses to environmental change
Recommended from our members
Ultra-Strong Machine Learning: comprehensibility of programs learned with ILP
During the 1980s Michie defined Machine Learning in terms of two orthogonal axes of performance: predictive accuracy and comprehensibility of generated hypotheses. Since predictive accuracy was readily measurable and comprehensibility not so, later definitions in the 1990s, such as Mitchell’s, tended to use a one-dimensional approach to Machine Learning based solely on predictive accuracy, ultimately favouring statistical over symbolic Machine Learning approaches. In this paper we provide a definition of comprehensibility of hypotheses which can be estimated using human participant trials. We present two sets of experiments testing human comprehensibility of logic programs. In the first experiment we test human comprehensibility with and without predicate invention. Results indicate comprehensibility is affected not only by the complexity of the presented program but also by the existence of anonymous predicate symbols. In the second experiment we directly test whether any state-of-the-art ILP systems are ultra-strong learners in Michie’s sense, and select the Metagol system for use in humans trials. Results show participants were not able to learn the relational concept on their own from a set of examples but they were able to apply the relational definition provided by the ILP system correctly. This implies the existence of a class of relational concepts which are hard to acquire for humans, though easy to understand given an abstract explanation. We believe improved understanding of this class could have potential relevance to contexts involving human learning, teaching and verbal interaction